Skip to content

Return reasoning content from model output#112

Merged
OhYee merged 3 commits into
mainfrom
support_reasoning_content
Jun 2, 2026
Merged

Return reasoning content from model output#112
OhYee merged 3 commits into
mainfrom
support_reasoning_content

Conversation

@zoeshawwang
Copy link
Copy Markdown
Collaborator

Summary

  • Remove protocol-level MODEL_PARAMETER_RULES gating from OpenAI Chat Completions and AG-UI reasoning output conversion.
  • Emit reasoning only when the returned reasoning_content payload is non-empty.
  • Keep reasoning extraction from additional_kwargs.reasoning_content, while stripping the nested transport-only field from protocol payloads.

Validation

  • uv run --python 3.11 --dev --extra server pytest -q tests/unittests/server/test_openai_protocol.py tests/unittests/server/test_agui_protocol.py tests/unittests/server/test_reasoning.py
  • uv run --python 3.11 --dev --extra server pytest -q tests/unittests/server
  • uv run --python 3.11 --dev --extra server ruff check agentrun/server/openai_protocol.py agentrun/server/agui_protocol.py tests/unittests/server/test_openai_protocol.py tests/unittests/server/test_agui_protocol.py
  • git diff --check

Notes

MODEL_PARAMETER_RULES can still be used by the runtime/model-call layer to control whether the model thinks. The SDK response protocol layer now trusts the actual model output: if reasoning is returned, it is surfaced; if reasoning is absent or empty, no reasoning field/event is emitted.

Protocol conversion should use the returned reasoning payload as the source of truth. MODEL_PARAMETER_RULES can still control model-side thinking, but OpenAI and AG-UI responses should not hide non-empty reasoning_content when the env flag says thinking is false.

Constraint: User requested removal of protocol-level thinking_enabled = is_thinking_enabled_from_env() gating.

Rejected: Keep env-based response suppression | runtime parameters can drift from the model output and hide returned reasoning.

Confidence: high

Scope-risk: narrow

Directive: Do not reintroduce protocol-level reasoning env gates; only emit reasoning when returned reasoning_content is non-empty.

Tested: uv run --python 3.11 --dev --extra server pytest -q tests/unittests/server/test_openai_protocol.py tests/unittests/server/test_agui_protocol.py tests/unittests/server/test_reasoning.py

Tested: uv run --python 3.11 --dev --extra server pytest -q tests/unittests/server

Tested: uv run --python 3.11 --dev --extra server ruff check agentrun/server/openai_protocol.py agentrun/server/agui_protocol.py tests/unittests/server/test_openai_protocol.py tests/unittests/server/test_agui_protocol.py

Tested: git diff --check

Change-Id: I638efa7ca19bf8ed9417fb1922d43205d4d52b65
Not-tested: Remote GitHub CI result pending after push.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the OpenAI Chat Completions and AG-UI protocol handlers to surface reasoning_content based on actual model output (non-empty) rather than gating at the protocol layer via MODEL_PARAMETER_RULES. It also strips transport-only additional_kwargs.reasoning_content from emitted protocol payloads while still promoting it into the standardized reasoning field/event when present.

Changes:

  • Removed protocol-layer MODEL_PARAMETER_RULES gating for reasoning emission in OpenAI and AG-UI protocol conversions.
  • Promoted reasoning_content from additional_kwargs (and stripped the nested transport field), emitting reasoning only when non-empty.
  • Updated unit tests to assert reasoning is included even when MODEL_PARAMETER_RULES disables thinking.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
agentrun/server/openai_protocol.py Removes protocol gating; promotes and cleans reasoning fields in OpenAI streaming/non-stream responses.
agentrun/server/agui_protocol.py Removes protocol gating; emits AG-UI reasoning events when non-empty and strips transport-only reasoning fields from additions.
tests/unittests/server/test_openai_protocol.py Updates/extends unit coverage for reasoning emission and promotion when thinking is disabled.
tests/unittests/server/test_agui_protocol.py Updates unit coverage for reasoning events and ordering when thinking is disabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 344 to +348
if event.event == EventType.REASONING:
if thinking_enabled:
reasoning_content = event.data.get("delta", "")
if reasoning_content:
has_text = True
yield self._build_chunk(
context,
{"reasoning_content": reasoning_content},
)
reasoning_content = event.data.get("delta", "")
if reasoning_content:
has_text = True
yield self._build_chunk(
Comment on lines 480 to 484
if event.event == EventType.TEXT:
content_parts.append(event.data.get("delta", ""))
reasoning_content = get_reasoning_content(event.addition or {})
if thinking_enabled and reasoning_content:
if reasoning_content:
reasoning_parts.append(reasoning_content)
The protocol handlers now surface reasoning_content when the model output contains it, independent of MODEL_PARAMETER_RULES. Keep the e2e contract aligned with that behavior so the false thinking flag case verifies passthrough instead of suppression.

Constraint: Requirement moved env thinking control to model invocation, while protocol conversion follows actual returned reasoning_content.

Rejected: Restore protocol-level env gating | It would contradict the current reasoning_content passthrough requirement.

Confidence: high

Scope-risk: narrow

Directive: Do not reintroduce protocol-level MODEL_PARAMETER_RULES gating for reasoning_content without a new product decision.

Tested: uv sync --python 3.10 --dev --all-extras; uv run --python 3.10 pytest tests/e2e/test_reasoning_protocol.py -q; git diff --check

Change-Id: I806b351980a716b4e0395414b0896ddd603747bc
Not-tested: Full cloud-backed e2e suite locally; waiting for GitHub Actions E2E.
The previous E2E run passed all reasoning protocol checks but failed one cloud sandbox call with RemoteDisconnected from the service side. Trigger a fresh run without changing behavior.

Constraint: GitHub integration cannot rerun failed jobs: Resource not accessible by integration.

Rejected: Change sandbox implementation or test expectation | The failure is outside the reasoning_content change and occurred after the same cloud test had already passed its async variant.

Confidence: medium

Scope-risk: narrow

Directive: Remove this empty retry commit before merge if maintainers prefer a cleaner history, or squash it with the e2e expectation commit.

Tested: Prior run 26796842299 showed tests/e2e/test_reasoning_protocol.py all passed before one sandbox service RemoteDisconnected failure.

Change-Id: Idd9a2618187c61b6bbfb360c6f8de2cfb719974a
Not-tested: Fresh GitHub E2E result pending.
Copy link
Copy Markdown
Member

@OhYee OhYee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #112 — Return reasoning content from model output

结论:LGTM ✅

整体思路正确:MODEL_PARAMETER_RULES 应该控制发给模型的参数(是否启用 thinking),而不应该在协议层过滤模型已经返回的 reasoning 输出。如果模型返回了 reasoning content,SDK 应该如实透传。

改动要点

  1. 关注点分离清晰is_thinking_enabled_from_env() 从 OpenAI / AG-UI 协议层完全移除,不再做输出侧 gating
  2. 重命名恰当_apply_reasoning_gate_promote_reasoning_content,语义更准确:从 additional_kwargs 提取 → 清理传输字段 → 提升到顶层
  3. AG-UI _strip_reasoning_from_addition — 移除 thinking_enabled 参数后,空 dict 统一返回 None,更干净
  4. 测试全面 — 单元测试和 E2E 测试都已更新,覆盖 thinking disabled 场景下仍然输出 reasoning 的新行为

关于 Copilot 的两个 comment

  • sent_role 问题:这是 pre-existing behavior,本 PR 未改变事件顺序,不是 regression
  • E2E 测试未更新:已经在 test_reasoning_protocol.py 中更新了,Copilot 判断有误

🤖 Reviewed by Cortex + Claude Code

@OhYee OhYee merged commit 288406f into main Jun 2, 2026
2 of 3 checks passed
@OhYee OhYee deleted the support_reasoning_content branch June 2, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants